AITopics | proxy data

Collaborating Authors

proxy data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning Y exiao He

Neural Information Processing SystemsFeb-17-2026, 14:32:35 GMT

The pre-trained Large Language Models (LLMs) can be adapted for many downstream tasks and tailored to align with human preferences through fine-tuning.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Multi-Task Learning from Summary Statistics: Supplementary Materials A Proofs A.1 Preliminaries

Neural Information Processing SystemsFeb-16-2026, 10:18:07 GMT

The proofs of Lemmas A.2 and A.3 follow from covering arguments and Bernstein's inequality; refer

artificial intelligence, estimator, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

a924b7178e5975dfed1de235f0b72973-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 10:18:04 GMT

artificial intelligence, machine learning, summary statistics, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Montserrat (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology:

Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Black BoxRipper

Neural Information Processing SystemsFeb-10-2026, 22:25:58 GMT

In this context, we present a teacher-student framework that can distill the black-box (teacher) model into astudent model with minimal accuracyloss.

artificial intelligence, inproceedingsoficlr, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Romania (0.05)
North America > Canada > Ontario > Toronto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning Y exiao He

Neural Information Processing SystemsOct-10-2025, 13:55:52 GMT

The pre-trained Large Language Models (LLMs) can be adapted for many downstream tasks and tailored to align with human preferences through fine-tuning.

contribution, dataset, shapley value, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Multi-Task Learning with Summary Statistics

Neural Information Processing SystemsOct-9-2025, 04:08:58 GMT

However, the application of multi-task learning to real-world settings is hindered by data-sharing constraints, especially in healthcare settings.

artificial intelligence, machine learning, summary statistics, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > Montserrat (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Federated Distillation on Edge Devices: Efficient Client-Side Filtering for Non-IID Data

Mujtaba, Ahmed, Radchenko, Gleb, Prodan, Radu, Masana, Marc

arXiv.org Artificial IntelligenceAug-21-2025

Federated distillation has emerged as a promising collaborative machine learning approach, offering enhanced privacy protection and reduced communication compared to traditional federated learning by exchanging model outputs (soft logits) rather than full model parameters. However, existing methods employ complex selective knowledge-sharing strategies that require clients to identify in-distribution proxy data through computationally expensive statistical density ratio estimators. Additionally, server-side filtering of ambiguous knowledge introduces latency to the process. To address these challenges, we propose a robust, resource-efficient EdgeFD method that reduces the complexity of the client-side density ratio estimation and removes the need for server-side filtering. EdgeFD introduces an efficient KMeans-based density ratio estimator for effectively filtering both in-distribution and out-of-distribution proxy data on clients, significantly improving the quality of knowledge sharing. We evaluate EdgeFD across diverse practical scenarios, including strong non-IID, weak non-IID, and IID data distributions on clients, without requiring a pre-trained teacher model on the server for knowledge distillation. Experimental results demonstrate that EdgeFD outperforms state-of-the-art methods, consistently achieving accuracy levels close to IID scenarios even under heterogeneous and challenging conditions. The significantly reduced computational overhead of the KMeans-based estimator is suitable for deployment on resource-constrained edge devices, thereby enhancing the scalability and real-world applicability of federated distillation. The code is available online for reproducibility.

artificial intelligence, conv2d, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.14769

Country: Europe > Austria (0.30)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Black-Box Ripper: Copying black-box models using generative evolutionary algorithms Antonio B arb al au

Neural Information Processing SystemsAug-17-2025, 02:09:19 GMT

In this context, we present a teacher-student framework that can distill the black-box (teacher) model into a student model with minimal accuracy loss.

black-box ripper, proceedings, proxy data, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)

Genre: Research Report (0.68)

Industry:

Transportation > Air (1.00)
Education (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Using Imperfect Synthetic Data in Downstream Inference Tasks

Byun, Yewon, Gupta, Shantanu, Lipton, Zachary C., Childers, Rachel Leah, Wilder, Bryan

arXiv.org Machine LearningAug-12-2025

Predictions and generations from large language models are increasingly being explored as an aid to computational social science and human subject research in limited data regimes. While previous technical work has explored the potential to use model-predicted labels for unlabeled data in a principled manner, there is increasing interest in using large language models to generate entirely new synthetic samples (also termed as synthetic simulations), such as in responses to surveys. However, it is not immediately clear by what means practitioners can combine such data with real data and yet produce statistically valid conclusions upon them. In this work, we introduce a new estimator based on generalized method of moments, providing a hyperparameter-free solution with strong theoretical guarantees to address the challenge at hand. Surprisingly, we find that interactions between the moment residuals of synthetic data and those of real data can improve estimates of the target parameter. We empirically validate the finite-sample performance of our estimator across different regression tasks in computational social science applications, demonstrating large empirical gains.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2508.06635

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Filters

Collaborating Authors

proxy data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning Y exiao He

Multi-Task Learning from Summary Statistics: Supplementary Materials A Proofs A.1 Preliminaries

a924b7178e5975dfed1de235f0b72973-Paper-Conference.pdf

Black BoxRipper

SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning Y exiao He

a924b7178e5975dfed1de235f0b72973-Supplemental-Conference.pdf

Multi-Task Learning with Summary Statistics

Federated Distillation on Edge Devices: Efficient Client-Side Filtering for Non-IID Data

Black-Box Ripper: Copying black-box models using generative evolutionary algorithms Antonio B arb al au

Using Imperfect Synthetic Data in Downstream Inference Tasks